arXiv: 1501.07342v2 [q-bio.MN] 7 May 2015 
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Information theory is gaining popularity as a tool to characterize performance of biological sys¬ 
tems. However, information is commonly quantified without reference to whether or how a system 
could extract and use it; as a result, information-theoretic quantities are easily misinterpreted. Here 
we take the example of pattern-forming developmental systems which are commonly structured as 
cascades of sequential gene expression steps. Such a multi-tiered structure appears to constitute 
sub-optimal use of the positional information provided by the input morphogen because noise is 
added at each tier. However, the conventional theory fails to distinguish between the total informa¬ 
tion in a morphogen and information that can be usefully extracted and interpreted by downstream 
elements. We demonstrate that quantifying the information that is accessible to the system nat¬ 
urally explains the prevalence of multi-tiered network architectures as a consequence of the noise 
inherent to the control of gene expression. We support our argument with empirical observations 
from patterning along the major body axis of the fruit fly embryo. Our results exhibit the limi¬ 
tations of the standard information-theoretic characterization of biological signaling and illustrate 
how they can be resolved. 
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As an inspiring example of productive collaboration 
between computer science, physics and biology, informa¬ 
tion theory is gaining popularity as a tool to characterize 
performance of biological systems. Although is may not 
have become the “general calculus for biology”, as pre¬ 
dicted by Johnson in his 1970 review PQ, the scope of its 
applications has been steadily expanding: from the ear¬ 
liest work measuring the information content in DNA, 
RNA and proteins to topics like neuroscience, collective 
behavior, ecology, developmental biology, genetic regula¬ 
tion and signaling nu. 

Specifically in the context of biochemical signaling, 
several recent reviews make compelling arguments that 
the mutual information between input and output of a 
signaling pathway is not just a useful quantity, but is 
in fact the “only natural framework” for characterizing 
the performance of such systems. However, implicit in 
these arguments is the assumption that the “output” in 
question is the final target of signaling, the functionally 
relevant phenotypic trait. Unfortunately, in biological 
applications of information theory information content is 
usually assessed for signals that constitute intermediate 
steps, most commonly transcription factors, for example, 
NF-kB [6j|7] or Drosophila patterning cues |8]. Such sig¬ 
nals, however, still need to be interpreted by downstream 
processes. Therefore, the information they carry is useful 
only to the extent that it can be extracted and used by 
the system. As we will demonstrate, failure to recognize 
this can easily cause information-theoretic quantities to 
be misinterpreted. 

To show this, we take the example of gradient- 


mediated patterning circuits. For a complex multicel¬ 
lular organism, the reliability of its developmental pro¬ 
gram directly determines the probability of reaching re¬ 
productive age; therefore, low error rate and/or high er¬ 
ror tolerance are likely to be key determinants of the 
structures of developmental circuits mm ■ Why, then, 
are so many patterning circuits structured as a cascade 
of several signaling steps, each of which is susceptible 
to loss of information due to noise inherent in biological 
control? We will see that treating information content 
of patterning cues as a one-size-fits-all method to char¬ 
acterize system performance erroneously predicts that a 
single-step readout strategy should be dominant in de¬ 
velopment. To understand the advantages of the multi¬ 
tiered architectures observed in real systems, it is es¬ 
sential to distinguish between the total information in 
a morphogen and information that can be usefully ex¬ 
tracted and interpreted. We support our reasoning with 
experiments on the well-studied segmentation gene net¬ 
work responsible for anterior-posterior patterning in the 
Drosophila embryo. 

Multi-tier architecture in gradient-mediated pat¬ 
terning. In many developing embryonic systems, cellu¬ 
lar identities are conferred by graded input signals that 
induce dose-dependent gene expression programs as out¬ 
puts [lTJ|T2]. Such graded inputs, termed morphogens, 
often function as diffusible molecules produced by a lo¬ 
calized expression source usual. Localized expression 
generates concentration gradients in a field of otherwise 
naive and identical cells (presented in simplified form as 


2 


a one-dimensional array in Fig. [l]) . Cells activate specific 
expression programs in response to the local morphogen 
concentration c(x). When c correlates closely with dis¬ 
tance x from the source, such gradients carry a large 
amount of “positional information” m quantified via 
the mutual information I[c(x),x] [SI [16] . In principle, a 
morphogen gradient carrying sufficient information could 
induce in each cell the gene expression program appro¬ 
priate for its position, thus generating the required spa¬ 
tial arrangement of cell fates [T7] (Fig. [l]A.) . In the most 
straightforward model, assuming the input morphogen 
is sufficiently reproducible HE], local morphogen concen¬ 
tration is directly interpreted by each cell, i.e., the lo¬ 
cal input activates all genes required at a given posi¬ 
tion, with no additional cycles of gene expression mod¬ 
ulation. A central tenet of information theory, the in¬ 
formation processing inequality, states that each trans¬ 
mission or processing step can only reduce the total in¬ 
formation contained in a signal. Direct decoding might 
therefore be expected to dominate in early development 
as the optimal strategy for transmitting positional in¬ 
formation. This expectation seems all the more valid 
given the widespread observation that the processes of 
transcription and translation exhibit considerable intrin¬ 
sic variability, or noise [19] 20 . Thus information loss in 
gene regulatory processes should be particularly notable. 

Therefore, from the perspective of information theory, 
it is surprising that many gradient-based systems exhibit 
a multi-tiered architecture in which reiterated cycles of 
transcription and translation are required to attain pat¬ 
terning goals (illustrated in Fig. [lj3). For example, in 
the vertebrate central nervous system, the unpatterned 
neuroectoderm exhibits a graded distribution of multiple 
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FIG. 1. Direct versus multi-tiered decoding strategies for 
gradient-mediated patterning. (A) Direct decoding: to re¬ 
duce noise introduced by intrinsically variable gene expres¬ 
sion, patterning proceeds through a single cycle of transcrip¬ 
tion and translation. Differences in morphogen input c(x) 
directly specify gene expression programs A-F along axis x. 
(B) Multi-tiered decoding: morphogen first elicits expression 
of short range diffusible factors in domains spanning several 
cells. These gene products then induce programs A-F through 
a second cycle of transcription/translation. The added step 
introduces additional gene expression noise, reducing pattern¬ 
ing information compared to direct decoding (A). 


diffusible signaling molecules. These signals subdivide 
the prospective brain into relatively large fore-, mid-, 
and hindbrain territories, which are then segmented into 
smaller subunits by additional signaling activity p?Bf23] . 
Similar patterns of broad subdivision followed by short- 
range refinement are found during the specification of the 
vertebrate neural crest by reiterated rounds of extracel¬ 
lular signaling [24] ; in the formation of segmented muscle 
precursors (somites) by FGF and Notch followed by short 
range Ephrin activity [25 [ 26]; the dorsal-ventral pattern¬ 
ing of the Drosophila body axis, first by a gradient of NF- 
kB activity (also called Dorsal) and then by members of 
the BMP family of secreted signaling molecules [271 [28]; 
and also in the fruit fly, the patterning of the anterior- 
posterior (AP) axis by gradients of diffusible transcrip¬ 
tion factors within the shared cytoplasm of the nuclear 
syncytium [13 EH ED]. 

These examples and others illustrate a common theme 
where long range signaling gradients subdivide a large 
field into smaller domains, within which the patterned ex¬ 
pression of secondary factors establishes elaborated pat¬ 
terns (Fig. [l|3). Since each cycle of transcription and 
translation introduces more noise, the widespread use of 
the multi-tiered architecture appears to conflict with the 
expectation that development should favor circuits ex¬ 
hibiting efficient information utilization. 

This apparent conflict arises because Shannon’s infor¬ 
mation content of a signal [16] has two important lim¬ 
itations. First, the information content of a patterning 
cue or other biological signal is defined locally in space 
and time, whereas its interpretation is non-local, and in¬ 
stead occurs over time and frequently involves diffusive 
signals. For this reason, the naive application of informa¬ 
tion processing inequality in these systems is incorrect, 
and the local, instantaneous information content in a sig¬ 
nal does not in fact provide an upper bound for the per¬ 
formance of downstream processes interpreting this sig¬ 
nal mmm- Second, the same amount of information 
can be encoded in formats that are more or less easy for 
the system to access, since the interpreting circuit is it¬ 
self subject to noise. Thus, the local information content 
of a signal is neither an upper bound nor a fair estimate 
of the amount of information this signal can “transmit” 
to the downstream circuit. This is well illustrated by the 
recent experimental work on ERK, calcium and NF-^B 
pathways [7 . If the output of any of these pathways is 
reduced to a single scalar, it is found to transmit very lit¬ 
tle information about the input. If the output is treated 
as a dynamical variable, its apparent information content 
increases considerably [32] . Neither of these quantities, 
however, can be interpreted before it is established what 
fraction of that information can actually be extracted and 
used by the system. Here we use a simplified model to 
illustrate these limitations of what we call “raw” informa¬ 
tion content, contrasting it with “accessible information” 
that we introduce. 
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Results 


An abstract gradient response problem. A one¬ 
dimensional array of cells i located at positions (0 < 
Xi < L ) is exposed to a noisy linear gradient of an input 
morphogen c(x) spanning the range [0, c max ]. To build 
intuition, we will assume the noise of input c(x) to be 
Gaussian, of constant magnitude ctq, and uncorrelated 
between cell^] c(xi) = q = (xi/L)c max + cq, where cq 
are i.i.d., drawn from a Gaussian of width ao (Fig. [2jA). 
Cells respond to morphogen c(x) by modulating gene ex¬ 
pression through intrinsically noise-prone signal trans¬ 
duction and transcription/translation processes. We will 
model this response as a composition of three steps, three 
elementary operations that constitute the “toolkit” with 
which cells can access and process information contained 
in patterning cues: access , amplify , and average. 

Let g out be a gene product whose expression is con¬ 
trolled by c(x) . The simplest readout is achieved by plac¬ 
ing gene g out under the control of a promoter that is re¬ 
sponsive to c and by accumulating the output protein for 
some time r. In our model, we express the amount of g out 
produced during this time by a cell i as g° ut = F(c est ), 
where c® st is a noisy estimate of the true concentration q 
that the system could obtain in time r (“access”), and F 
is some deterministic input-output function (“amplify”); 
for simplicity, we first consider F to be pure linear am¬ 
plification with coefficient A, denoted F\. The “access” 
operation is the key element of our framework. Specifi¬ 
cally, we write 

cf = Ci + 77i, 

where rji reflects the intrinsic stochasticity of transcrip¬ 
tion and, in principle, many other noise sources. Here we 
will model rji simply as being drawn from a Gaussian dis¬ 
tribution of width 770 • In other words, we postulate that 
each “access” operation takes time r and comes at the 
price of corrupting the signal with extra noise of magni¬ 
tude 770 • 

The final toolkit operation is averaging. Because pat¬ 
terning systems typically act over durations that are 
long (hours) compared to the time required to synthesize 


1 The assumption of uncorrelated noise is intentionally strong. In 
a real system, correlated noise can be introduced, for example, 
by variations in the total amount of morphogen deposited mater¬ 
nally. These fluctuations, which cannot be reduced by averaging, 
lead to imperfect reproducibility of morphogen activity at a given 
location across multiple embryos. Much work has focused on in¬ 
vestigating the limitations imposed on patterning by this type 
of fluctuations [3 E3 CEO- In contrast, our model is applica¬ 
ble for understanding the effects of imperfect precision of gene 
expression (at a given location within the same embryo). The 
distinction between “raw” and “accessible” information does not 
rely on the assumption of uncorrelated noise. 



FIG. 2. The two patterning strategies. A: In the direct 
strategy, target genes are controlled directly by c. B: The two- 
tier strategy involves a second patterning factor c^; target 
genes are separated from the input by two tiers of “access” 
operations. Left, raw information content. Right, accessible 
information content. 

mRNA and protein (minutes), cells can perform tem¬ 
poral averaging by allowing stable gene products to ac¬ 
cumulate [35]: if T is the time available for pattern¬ 
ing, the system can effectively perform T/r access op¬ 
erations. In addition, the production of soluble factors 
that can be shared between cells gives rise to spatial av¬ 
eraging [35, 36]. Both types of averaging offer the system 
some capacity to perform multiple measurements of the 
input, which we capture formally by an averaging oper¬ 
ator G 7 v eff . Here N g q indicates the effective number of 
independent measurements, so that application of Gn gH 
to a morphogen, by definition, reduces expression fluctu¬ 
ations by a factor l/N e Q. 

We distinguish between two patterning strategies. In 
the first (“direct strategy”; Fig. IF> cell-fate-specific 
target genes are controlled directly by c and no other 
patterning factors are involved. Any available averaging 
mechanisms are applied to c itself. In the second (“two- 
tier”) strategy, cells perform an amplifying readout of c 
with input-output function F\ to establish a spatial pro¬ 
file of a second factor c^ (Fig.|2^). The pattering time T 
is spent on accumulating and averaging c^. Mathemat¬ 
ically, in the two scenarios, the cell-fate-specific target 
genes are controlled by: 

c (°) _ Gjv e ff [c] (direct strategy) (1) 

C (A) = Gjv eS [F A (c + T/)\ (two-tier strategy) (2) 

We now ask: when, if ever, does the noisy amplification 
step of the two-tier strategy provide a benefit to the sys¬ 
tem? 
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Standard information-theoretic considerations do 
not explain the benefits of amplification. The po¬ 
sitional information carried by a linear morphogen c(x) 
with dynamic range c max and noise <7o, which we call the 
“raw information content” of a gene expression profile, is 
given by 


I va w[c(x),x\ = In 

(see Supplementary Information). It depends only on 
the ratio 0 = c max /cr 0 ; for convenience, we define Z( 0 ) = 

In ^ ^, which is an increasing function of </>. 

Let us compare the two patterning strategies from 
the point of view of the raw information content car¬ 
ried by the controlling signal. In the direct strat¬ 
egy ©, the application of Gw eff reduces the input noise 
to cr 0 /\A/V e ff and so the controlling signal carries 

I raw = X ( — — ) bits of raw information. In the two- 

\<To/V^eff / 

tier strategy © , the amplified profile c^ is characterized 

by noise £a = A and its raw information content 

is therefore 

(3) 

Averaging mitigates the loss of positional information 
when using a noisy readout [36]. If N e ^ is sufficiently 
large, the amplified and averaged profile carries even 
more information than the original input. (Note that 
the information processing inequality is not violated, as 
it states only that the output cannot carry more informa¬ 
tion than 7V e ff independent copies of the input.) Never¬ 
theless, applying averaging directly to the input (the di¬ 
rect strategy) always yields more raw information; thus, 
the multi-step scenario appears inferior to a direct read¬ 
out. 

In real systems, the three operations we treat as in¬ 
dependent may be mechanistically linked. For example, 
if c(x ) is an intracellular factor while spatial averaging 
requires a small diffusible molecule, then performing an 
extra readout can provide access to an otherwise unavail¬ 
able averaging mechanism. By assuming that the two 
strategies 0 and 0 can benefit from equal amounts of 
averaging, which in our model simply reduces expression 
noise and is obviously beneficial, we can focus specifically 
on the effect of signal amplification. Multi-tier pattern¬ 
ing proceeds through rounds of amplification: small dif¬ 
ferences in input result in large differences in gene expres¬ 
sion so as to establish increasingly sharp boundaries de¬ 
limiting expression domains yet in our expression (|3| 
for the information content of the amplified profile c^, 
the amplification factor A cancels out. Thus, consider¬ 
ations based on raw information content fail to explain 
the prevalence of signal amplification. 



FIG. 3. A: Noisy amplification can increase accessible infor¬ 
mation even if raw information is reduced. Inner error bars 
are the signal variability and increase when amplification adds 
new noise, reducing / raw - Outer error bars represent the sig¬ 
nal observed by the noisy cell machinery (corrupted by noise 
770 ). After amplification, the relative importance of 770 is re¬ 
duced, increasing I acc . B: The “segmentation” input-output 
function for integer A (here A = 3) preserves the dynamic 
range of morphogen concentration. Locations such as those 
indicated by dots now have identical expression levels of 
(the y axis), but can be distinguished using the input mor¬ 
phogen c (the x axis on this plot). 


The benefits of the multi-tiered strategy lie in 
making the “raw” information more accessible. 

The benefits of amplification and the advantages of the 
multi-tier strategy become clear when we observe that, 
due to the intrinsic noise in the regulatory readout, the 
raw information content is an inadequate measure of a 
morphogen’s usefulness to the system. The purpose of a 
morphogen is to activate downstream processes; the rele¬ 
vant quantity is therefore not the amount of information 
a morphogen carries, but the amount of information it 
can transmit to its downstream targets. Since biologi¬ 
cal control is intrinsically noisy, the two quantities are 
distinct. 

Our model was designed to make this particularly 
clear: since the system can never access the true concen¬ 
tration c, but only a noisy estimate c est , J raw [c] is beyond 
the system’s reach. We define accessible information in a 
morphogen J acc as the amount of information the system 
can access in time r: 


ZaccH =/ raw [c eSt ] =/ raw [c + 77], (4) 

where 77 , again, is a Gaussian noise of magnitude 770 
within our model. 

The amount of accessible information provided by the 
direct strategy (Fig. [2^) is given by 


FI = 1 




whereas for the amplified profile c ( A) it is 


(5) 


FI = 1 



= 1 



I % 

+ A 2 


(6) 


The amplification factor A no longer cancels out in 
amplifying dynamic range is beneficial, since it reduces 
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the relative importance of the intrinsic readout noise 
(Fig.§V). Comparing 0 and ([6|, we find that the extra 
tier of noisy amplification is beneficial if and only if 

’ 4 -^ r ;?)" 0 < 7) 

Note that the condition 0 is never satisfied if N e q = 1 
(no averaging) or A = 1 (no amplification). Intuitively, 
our argument demonstrates that the patterning system 
is a mechanism that invests some effort into making a 
careful measurement (N e s > 1) and encodes this infor¬ 
mation in a more accessible format where steeper con¬ 
centration changes (A > 1) can be interpreted with a 
faster, and therefore noisier readout. This mechanism 
is useful precisely because regulatory readout is intrin¬ 
sically noisy, otherwise direct readout would have been 
the better strategy. In other words, to understand the 
purpose of the patterning system, it is essential to dis¬ 
tinguish between the total information in a morphogen 
and information that can be usefully extracted and in¬ 
terpreted. 

Multiple tiers improve gradient interpretation 
even when raw information decreases. So far we 

considered the information content (raw or accessible) in 
each tier separately. However, in principle, downstream 
processes could access all patterning cues and not simply 
the final tier [381139] , As a result, extra readout tiers can 
be beneficial even when they carry very little information 
on their own. 

To see this, consider the input-output function de¬ 
picted in Fig. [3j3. In some respects, it is more realistic 
than the purely amplifying linear readout F\ considered 
above, since real patterning systems must operate within 
a limited global dynamic range of morphogen concentra¬ 
tions. Let z be the morphogen profile established by 
the new Fy -shaped readout of c; it has noise magnitude 
£a (same as the noise in c^), but is folded onto itself 
A times, reminiscent of the spatially reiterated expres¬ 
sion of genes involved in Drosophila axis segmentation. 
Repeatedly using the same output values at multiple po¬ 
sitions naturally reduces mutual information between the 
output concentration and position: 

W* (A) ]= Wc (A) ]-lnA 

4cc[^ (A) ] =4cc[c ( a) ]— InA. 

However, the A locations with identical concentrations 
of are made distinguishable by the original mor¬ 
phogen c (Fig. [3^) . Therefore, the joint information that 
the original and the amplified profiles together provide 
about a cell’s location is the same for F ^ as it was for 

F x : 

I[{c,z w },x\ = I[{c,c {x) },x] 


Replacing information content of a single profile by this 
joint information, our argument demonstrating that am¬ 
plification increases accessible information can now be 
repeated verbatim [4(4, and we again find that the ex¬ 
tra readout is beneficial as long as 0 is satisfied. Note, 
however, that on its own, may carry less information 
than the original morphogen c. The easiest way to see 
this is to compare their noise levels: 



If the effect of amplification is stronger than that of aver¬ 
aging, we find £\/cro > 1- In this scenario, the amplified 
profile z^F has the same dynamic range but lower pre¬ 
cision than the original morphogen c, and therefore, on 
its own, carries less information (whether raw or acces¬ 
sible). This shows that evaluating the usefulness of a 
particular cue from information-theoretic standpoint can 
lead to misleading results, unless all other relevant cues 
(which are often hard to establish) are taken into ac¬ 
count simultaneously. Here, we demonstrated that sys¬ 
tems can benefit from multi-tiered interpretation even 
in cases where intermediate steps occur at a net loss of 
information, increasing noise. 

The multi-tier structure of Drosophila segment 
patterning increases information accessibility. In 

this system, segmentation of the AP axis proceeds 
through four tiers of gene activity, termed maternal gra¬ 
dients, gap genes, pair-rule genes, and segment polarity 
genes [30] . The sequential activity of each tier subdi¬ 
vides the naive blastoderm into smaller domains of gene 
expression with increasingly sharp boundaries, culminat¬ 
ing in the designation of each row of cells with its own 
unique set of expressed genes (Fig. [4jA) . This process is 
subject to transcriptional noise with a large intrinsic com¬ 
ponent [351, as well as several other noise sources with 
different signatures [4TH44] . No single value of 770 ade¬ 
quately characterizes such readout noise. Nevertheless, 
we can gain important insight by computing C [c\ as a 
function of 770 , treating it as a variable parameter: the de¬ 
cay of I^cc [ c \ with 770 characterizes the tolerance to added 
noise of the information encoded in the morphogen (or 
set of morphogens) c. Applied to gene expression data 
from the early Drosophila segmentation gene network, 
this analysis will show how our simple model explains 
the use of multi-tier gradient interpretation in a real sys¬ 
tem (Fig. 4). 

We focus on a particular node in this network whereby, 
in early embryos, two gap genes, hb and Kr, regulate a 
pair-rule gene eve. For 0.37 < xap < 0.47, where Kr 
and hb expression form opposing boundaries, they are 
jointly responsible for creating the trough between eve 
stripes 2 and 3; other inputs to eve are negligible in this 
region at this time mans]. Protein levels are measured 
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FIG. 4. A: Immunostaining of three antero-posterior (AP) axis patterning genes in the same embryo. Rather than specifying 
cell fate directly, the “gap genes” such as hunchback (Hb; top) and Krilppel (Kr; middle) control “pair-rule” genes such as 
even-skipped (Eve, bottom). Both tiers regulate other genes further downstream. Boxes indicate the selected region of interest 
(ROI), where at this time, Hb and Kr are the only relevant inputs to Eve, as shown on the cartoon. B: Within the ROI 
(shaded), Eve exhibits higher expression noise than either Hb or Kr. Expression noise computed as RMS difference between 
expression level of a nucleus and its immediate dorsal or ventral neighbor (see Methods), plotted against AP distance from 
the Hb/Kr boundary (denoted :co). Error bars are standard deviation over N = 8 embryos. C: Idealized morphogen profiles, 
restricted to the ROI. Profile shape obtained as smooth spline-fit to expression values and noise magnitudes calculated for 
the profiles of panel A after projection onto the AP axis. D: For all but the lowest readout noise magnitude, joint accessible 
information content in the triplet (Hb,Kr,Eve) exceeds the accessible information provided by Hb and Kr alone, even in an 
extreme hypothetical case when they are rendered entirely noiseless. 


simultaneously in each nucleus by a triple immunostain¬ 
ing experiment (Fig. |4]A) in N = 8 single embryos. We 
determine the expression noise of each gene by compar¬ 
ing levels in a given nucleus with those of its immediate 
dorsal and ventral neighbors (see Methods). 

In the defined region of interest, eve expression noise 
is higher than the respective noise in hb or Kr expression 
(Fig. [4^). The information content of eve must therefore 
be lower than that carried by either of its two inputs. Due 
to the curvature of the embryo (Fig. EF- the positional 
information of a real morphogen is only approximately 
related to that derived from projection onto the imag¬ 
inary AP axis. Therefore, to estimate the information 
content for each of the three genes, we consider “ideal¬ 
ized” Gaussian-noise profiles (panel C) with mean and 
noise obtained by smoothing the measured values in real 
embryos. The idealized profiles are normalized to the 
same maximum and are, by construction, functions of 
xap carrying positional information /(c(xap), #ap)- Re¬ 
stricted to the region of interest, the information content 
of Hb and Kr is respectively 2.6 and 2.7 bits, whereas 
the larger noise of Eve reduces its information content 
to only 2.0 bits. Why, then, does the system use Eve to 
regulate downstream processes, rather than utilizing Kr 
and Hb directly? 

The answer becomes clear when we consider the ac¬ 
cessibility of information encoded in these morphogens, 
namely I£° c as a function of 770 (panel D). A patterning 
strategy lacking Eve can access only Hb and Kr. Even 
if some hypothetical filtering mechanism could reduce 
their expression noise to arbitrarily low level, the read¬ 


out noise magnitude 770 > 0 imposes an upper bound 
that ^accI c Hb 5 G<r] must satisfy. This corresponds to the 
information in a hypothetical pair of noiseless Hb and 
Kr and cannot be achieved in practice; it is a theoretical 
best-case scenario for any strategy lacking Eve. 

When the readout noise 770 is zero, coincides with 
the raw information content, which for perfectly noise¬ 
less Hb and Kr would be infinite. However, as read¬ 
out noise increases, the performance bound becomes fi¬ 
nite and drops quickly (black curve). This behavior con¬ 
trasts with the joint accessible information of the triplet 
(Hb,Kr,Eve) (magenta) as calculated using the actual 
measured noise of each of the three profiles. The ac¬ 
cessible information content in the triplet is, of course, 
always finite, but it is also more tolerant to readout noise: 
due to the steeper slopes of the Eve profile, as 770 in¬ 
creases, the accessible information content of the triplet 
(Hb,Kr,Eve) decreases slowly; importantly, more slowly 
than the black curve. Therefore, a crossing point is ob¬ 
served, whose presence does not qualitatively depend on 
the specifics of the readout noise model (e.g. absolute 
noise magnitude can be replaced by fractional). Remark¬ 
ably, although Eve is measurably noisier than either of its 
inputs, its presence enables the system to access more in¬ 
formation than could have been extracted from Hb and 
Kr alone, even if these inputs could be rendered per¬ 
fectly noiseless. In practice, the enhancers of the pair- 
rule genes also contain binding sites for maternal tran¬ 
scription factors [38, |39], which may lead to a further in¬ 
crease in the precision of gene expression. However, our 
framework demonstrates that even if Eve were regulated 
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by Hb and Kr only, and so were fully redundant in the 
standard information-theoretic sense, the additional tier 
would still confer an advantage, because transcription is 
intrinsically noisy. 


Discussion 

The Drosophila patterning network has been described 
as performing a “transition from analog to digital spec¬ 
ification” of cell identity m- The “digital” metaphor 
has its limitations: even for Eve, the graded distribution 
within gene expression domains contains information [8]; 
nevertheless, it expresses the correct intuition that the 
final pattern is more tolerant to noise. Importantly, the 
standard information-theoretic formalism does not cap¬ 
ture this intuition: for instance, the profile depicted in 
Fig. [3^ has the same information content for all A. Noise 
tolerance — a critically important feature in biological 
systems — becomes manifest only when the readout pro¬ 
cess is considered explicitly, for example, as we have done 
in our definition of accessible information. This point is 
implicit in the theoretical work investigating the so-called 
“input noise” m, but has not been emphasized. This 
is because in a theoretical discussion of an abstract bio¬ 
chemical circuit, the quantities for which information is 
computed are easily postulated to be the complete input 
and the final output; in this manner, valid theoretical 
results can be derived without a concern for informa¬ 
tion accessibility (for some recent examples, see [47], 48]). 
However, when information-theoretic arguments are ap¬ 
plied to experimental data where the measured quantity 
is only an intermediate step, e.g. a transcription factor 
regulating downstream events, the question of informa¬ 
tion accessibility can no longer be neglected. 

For example, it has been suggested that certain sig¬ 
naling circuits may have evolved towards optimal infor¬ 
mation transmission ® 0 . Although the argument is 
plausible, applying it in practice requires caution. Con¬ 
sider, once again, the example of a developmental circuit. 
If the entire set of functional (cell-fate specific) genes 
were to be included into consideration, then information 
transmission from the input to this entire layer of func¬ 
tional genes would be a plausible objective function for 
this whole network to maximize, under some “bounded 
complexity” constraint penalizing solutions where hun¬ 
dreds of cell-fate specific genes are all controlled by highly 
complex enhancers with combinatorial, cooperative reg¬ 
ulation. However, the usual, more economical approach 
does not consider the full set of hundreds of cell-fate de¬ 
termining genes. Instead, it recognizes that the bulk of 
the patterning task is accomplished by a small subset of 
dedicated genes that engage in complex cross-regulation 
to establish the pattern that all other genes can then in¬ 
terpret simply. If we focus only on this core subset, the 


“economy of complexity” constraint is conveniently im¬ 
posed by construction. We must realize, however, that 
maximizing information transmission to the target genes 
(downstream of the patterning core) imposes a differ¬ 
ent requirement onto this core circuit than merely effi¬ 
cient information transfer within the core itself. Instead, 
the core circuit must function as a format converter, re¬ 
encoding information at its input into a format that can 
be accessed with a simpler and faster readout, that of a 
patterning cue by a functional gene. 

Curiously, it has been shown that in small networks 
with a realistic model of noise, maximizing raw informa¬ 
tion transmission leads to network structures exhibiting 
features such as tiling of patterned range with amplifying 
input/output readouts [49H5T] . i.e. features that tend to 
also make information more accessible, even though the 
optimization scheme employed in these studies did not 
specifically consider the encoding format. This remark¬ 
able coincidence, however, should not obscure the fact 
that ultimately the two tasks — maximizing information 
transmission and re-encoding it in a more accessible for¬ 
mat — could be conflicting. 

Information theory is a powerful tool; its formalism 
does not, however, aim to replace considerations of what 
constitutes useful information or how it might be used 
by the system. As it is gaining popularity in biologi¬ 
cal applications, it is important to remember that for 
a channel X \-> T, the relation between mutual infor¬ 
mation /(X, Y) and the ability to use Y to determine 
X is only asymptotic: Shannon PI proved that it is 
the maximum rate of error-free communication via this 
channel, in the limit of infinite uses of the channel. Im¬ 
portantly, in development and biological signaling, the 
number of channel uses (e.g. integration time of the 
signal) is fundamentally finite [3]. Further, Shannon’s 
results assumed an encoder/decoder of infinite computa¬ 
tional power m- This asymptotic rate is never in fact 
achieved in practice [52] . but in biological context, per¬ 
formance is constrained even further, since the “encoding 
scheme” is usually limited to measuring the same signal 
multiple times. In communication theory, this bears the 
name of “repetition code” and is formally classified as a 
“bad code”, i.e. a code that does not attain Shannon’s 
bound even asymptotically. This means that extracting 
all the “raw” information from a signal is impossible even 
in principle. For example, a signaling pathway with ca¬ 
pacity of 1 bit is never sufficient to make a reliable binary 
decision [3], and therefore should not be conceptualized 
as a binary switch. 

As illustrated here, making the distinction between 
“raw” and “accessible” information will be crucial for 
understanding the architecture and function of pattern¬ 
ing and signaling circuits. More work is required: our 
definition of accessible information relied on a simplis¬ 
tic noise model; in general, quantifying the usefulness 
of information-bearing signals in contexts where channel 


uses are limited will require reinstating considerations of 
rate/fidelity tradeoff, which Shannon could eliminate by 
taking the limit of infinite-time communication. Nev¬ 
ertheless, information theory remains a most adequate 
framework to address these issues, provided it is extended 
to quantify both the amount and accessibility of infor¬ 
mation. Our work provides a step in this direction and 
demonstrates how the extended framework naturally ex¬ 
plains a global architectural property shared by diverse 
patterning circuits. 
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Supplementary Information 


Information carried by a linear morphogen 
gradient 


Estimating expression magnitude (image 
processing) 


For a linear morphogen c(x) spanning the range 
[0, c max ], with constant Gaussian noise do, the informa¬ 
tion content is given by 

/ raw [c] = I[c{x),x] = In . 

To show this, we apply the definition of the mutual 
information: 


I[c(x),x]=H[P c )-H[P c \ x ] 

Here P c is the probability distribution of c (which is uni¬ 
form between 0 and c max ); P c \ x is the conditional distri¬ 
bution of the concentration of c given x (which is Gaus¬ 
sian of width do), and H[P] is the differential entropy of 
a probability distribution P: 

H[P] = -J P(z)\nP(z)dz =-(InP)p. 

Clearly, H[P C ] = lnc max . The second term is the en¬ 
tropy of a Gaussian distribution P ao of width cr 0 : 


P(J0 ( z ) 


7m exp (~^i) 


and therefore: 


H[P C \ X ] = -(In Pa 0 (z)) z = In + 



= In 



= In 


(co 



(8) 


Putting this together, we find: 


I[c(x),x]=H[P c }-H[P clx ]=ln 



Experimental procedures 

Antibody staining was performed using procedures and 
antisera described in PQ and |2j. Confocal microscopy 
was performed at 12 bit resolution on a Leica SP5 with 
a 20x HC PL APO NA 0.7 immersion objective at 1.4x 
magnified zoom using pixels of size 135 x 135 nm cover¬ 
ing an area of 554x554 mm. For each embryo, 17 images 
slices were obtained at a z interval of 4 microns, span¬ 
ning approximately 50% of embryo thickness. All data 
were collected in a single acquisition cycle using identical 
scanning parameters. 


The immunostaining procedure described above yields 
confocal stacks of images where pixel intensity corre¬ 
sponds to the recorded fluorescence level. Stacks were 
converted into projected Hb, Kr and Eve images (such 
as displayed on Fig. 4A) as the maximum projection of 
Gaussian-smoothed frames. The width of the averaging 
kernel (8 pixels, corresponding to approximately 1 /im) 
was smaller than the radius of the nuclei, therefore for 
pixels close to the nucleus center the averaging volume 
was wholly within the nucleus. Smoothing frames prior 
to maximum projection ensured robustness against imag¬ 
ing noise. 

In each of TV = 8 embryos, the location of nuclei was 
identified manually. For each of the projected images 
(Hb, Kr and Eve), we recorded the highest intensity value 
within 5 pixels of nuclei center locations as the fluores¬ 
cence intensity in that nucleus. Allowing for a 5-pixel 
“wiggle room” ensured robustness against registration er¬ 
rors across color channels, as well as against errors in the 
manual selection of nuclei center locations. The recorded 
intensity values were corrected for background autofluo¬ 
rescence by subtracting the mean intensity recorded in 
nuclei located in non-expressing regions of the embryo. 
The background-corrected fluorescence values reflect pro¬ 
tein concentration, up to a proportionality factor (inten¬ 
sity of a fluorophore). The fractional measurement noise 
in estimating relative concentrations can be estimated as 
the standard deviation of pixel intensity values within 
a nucleus on the projected map. In their respective re¬ 
gions of expression, this standard deviation of Hb, Kr and 
Eve pixel intensity constituted « 1% of the expression 



FIG. SI. Example of projected image (Eve). Black polygon 
indicates the analysis region, manually selected to exclude 
distorted areas close to the embryo edge. Rectangle indicates 
nuclei with the same projected coordinate onto the AP axis. 
Even in this perfectly ventral view of the embryo that mini¬ 
mizes the effects of stripe curvature (compare with Fig. 4A 
in the main text), the expression stripes are not exactly per¬ 
pendicular to this axis. 
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value and was therefore negligible compared to the ex¬ 
pression noise observed across nuclei (Fig. 4B). To avoid 
signal distortion artifacts observed at the edges of the im¬ 
aged portion of the embryo due to tissue curvature and 
compression, all analysis was restricted to nuclei located 
in the low-distortion region selected manually along the 
imaged embryo center line, typically 20-25 nuclei wide 
(Fig. [Si]). 


Estimating expression noise (Fig. 4B) 

Expression noise is defined as: 

Cnoise = ^recorded Cex P ected? 

where c recor ded is the recorded fluorescent intensity (of 
Hb, Kr or Eve), and c expec ted is the expected value at that 
location. Measuring noise therefore requires a method 
for constructing c expec ted- We use a method that we call 
“haltere-shaped filtering”. To introduce and motivate 
this method, we begin by discuss two simpler alterna¬ 
tives and their limitations: binning by AP coordinate 
and neighbor averaging. 


Binning by AP coordinate 

Since gap genes expression is often said to be a function 
of the location along the antero-posterior (AP) axis, one 
approach could be to define c expect ed as the average ex¬ 
pression level in all nuclei with a similar AP coordinate. 
This approach, however, would yield strongly biased re- 



FIG. S2. The simple neighbor-averaging method will under¬ 
estimate Cexpected in the regions where the profile is concave, 
e.g. at the peaks of Eve stripes (nucleus X), and overestimate 
Cexpected where the profile is convex, e.g. in the Eve troughs 
(nucleus Y). A: Eve stripes 2 and 3. Nuclei X and Y marked 
by smaller circles; the large circles encompass the neighbors 
over which averaging is performed. B: c no ise as estimated us¬ 
ing the neighbor-averaging method, shown as a function of 
AP coordinate. Black line: window average of c no ise over 50 
consecutive nuclei. This average should be close to zero for an 
unbiased estimate, but exhibits a clear correlation with the 
Eve profile shape. 


suits due to the curvature of gene expression domains 
(Fig.[STJ). 


Neighbor averaging 

A better approach is to construct c expect ed for each nu¬ 
cleus based on the expression levels observed in neighbor¬ 
ing nuclei. Since expression profiles are relatively smooth 
functions of location, the average of expression levels in 
nuclei that are immediate neighbors of nucleus i provides 
a reasonable expectation for c*. Despite being a signifi¬ 
cant improvement over the naive AP-based method, how¬ 
ever, the simple averaging over neighbors provides an un¬ 
biased estimate only in regions where the profile shape is 
well approximated by a linear dependence. In all other 
cases this estimate will have a bias proportional to the 
convexity (second derivative) of the mean profile shape. 
This is particularly clear for the sharply varying profile 
of Eve (Fig. [S2|\). This bias can lead to a dangerous 
artifact, whereby sharply varying profiles would appear 
to be more noisy, which would be unacceptable for our 
analysis of the Hb-Kr-Eve system. Fig. |S2p shows the 
inferred c no i se as a function of AP axis coordinate. The 
severity of the bias of the neighbor-averaging method of 
estimating c expecte d can be measured by the clearly ob¬ 
served correlation between c no i S e and the average profile 
shape of Eve (i.e. c rec0 rded)- 


Haltere-shaped filtering 

We now describe the procedure we used to construct 
c exp ected for our analysis. We begin by creating an “ex¬ 
pression map” whereby in the projected image such as 
depicted in Fig. [Si] the value of every pixel is replaced 
by the expression level c rec0 rded recorded in the nucleus 
closest to that pixel. The image is then filtered using 
a haltere-shaped filter depicted in Fig. |S3|4, and pixel 
values at each nucleus after filtering define the values of 

Cex P ected* 

This method combines the better qualities of the 
two approaches discussed above. On a perfectly regu¬ 
lar hexagonal lattice, this would be equivalent to the 
neighbor-averaging method using only the immediate 
dorsal and ventral neighbors, but the specific procedure 
we described naturally deal with lattice imperfections. 
In fact, c no i se in Fig. [S2) 3 was constructed using this 
exact procedure, but using an annulus-shaped filter de¬ 
picted in Fig. |S2|Y Since the gradient of expression pro¬ 
files is predominantly aligned with the AP axis, using a 
haltere-shaped filter greatly reduces any introduced bias 
(Fig.[S3p). 

One might expect that for even higher accuracy, the 
orientation of the haltere filter could be set not by per¬ 
pendicularity to the imaginary AP axis, but by the iso- 
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FIG. S3. A: “Eve map” of the region depicted in Fig. [S2}\, 
constructed as described in the text. X and Y label the same 
nuclei as in Fig. |S2] A; the larger circle marks their location. 
The smaller circles depict the haltere-shaped filter: c eX pected 
is constructed as the average pixel value over this area around 
each nucleus. B: Inferred c no ise shown as a function of AP co¬ 
ordinate. The performance of the haltere-filtering method 
shows marked improvement compared to annulus filtering 
(Fig. |S2p ), as indicated by the greatly reduced fluctuations 
of the window-averaged c no ise (in black). The fact that the 
magnitude of c no ise increases in regions of greater expression 
is normal: larger expression means larger absolute noise. 


for a given AP coordinate, expression noise is calculated 
as described in the previous section, i.e. prior to binning 
by AP. The result is the average of expression noise mea¬ 
sured locally for all nuclei at a similar AP location — 
as opposed to the variance of expression among all nu¬ 
clei at the same xap ; the latter, as we described, suffers 
from artifacts. The procedure we described effectively 
straightens out expression stripes: the resulting profile 
has the same mean and noise magnitude as observed ex¬ 
perimentally, but is, by construction, a function of a sin¬ 
gle variable. This approach contrasts with the procedure 
of pQ where embryos were imaged in cross-section and 
only dorsal or ventral “expression profiles” were used, 
i.e. expression levels were recorded along a particular 
AP line (from multiple embryos). Here, we use all nuclei 
observed on a slightly flattened surface of a single em¬ 
bryo, and the variation of expression profile shape with 
the dorsal-ventral coordinate becomes a major factor. 


Computing information content (Fig. 4D) 


lines of the actual expression profile after sufficiently 
strong smoothing. However, in practice such an approach 
is functionally less robust due to the number of tunable 
parameters, and we empirically found the fixed-angle hal- 
tere filtering to result in the lowest bias as measured by 
the correlation of average c no i S e in a region and the aver¬ 
age c recor ded in that same region. 

Idealized profiles (Fig. 4C) 

The expression profiles of long body axis patterning 
genes in Drosophila form a pattern that, to a good ap¬ 
proximation, can be considered one-dimensional. How¬ 
ever, as discussed above, due to the curvature of expres¬ 
sion profiles, xap is not the variable that best captures 
the variance. To estimate positional information in a 
gene expression pattern using data from single embryos, 
we therefore use the measured expression pattern shape 
and noise to construct what we call “idealized profiles”. 
First, we plot the recorded expression values c recor ded as 
a function of xap and construct a smooth spline fit that 
captures the mean profile shape; we denote the result 
M^ap)- Next, the same procedure is applied to expres¬ 
sion noise, estimated as described above: the smooth 
spline fit to c^ oise as a function of xap describes how the 
experimentally observed expression noise varies along the 
AP axis; we denote this root-mean-square deviation func¬ 
tion e(xAp). An expression pattern with mean fi(x ap) 
and independent Gaussian noise of magnitude e(xAp) 
constitutes the “idealized profile” of a given patterning 
cue (see Fig. 4C). 

Note that when calculating average noise magnitude 


By definition, the information content (or the mutual 
information) /(c, x) of a profile c(x) is the average reduc¬ 
tion of uncertainty of c after x becomes known: 

I(x,c) = S(c) - (S(c\x)) x . 

Here the first term is the entropy of the full distribution 
of c, which we denote P c , and S(c\x ) is the entropy of 
the conditional distribution P(c\x). We write: 

P c {c) = [p(c\x)P x (x)dx =--- [p(c\x)dx, 

J ^min ^max J 

because the position x is uniformly distributed between 
x m in and 3? max (in our case, xap min = 0.37 and xap max = 
0.47). 

These formulas express the information content of a 
one-dimensional profile entirely in terms of the condi¬ 
tional probability function p(c\x). For the idealized pro¬ 
file, at a given AP location xo, the conditional distri¬ 
bution p(c\xo) is Gaussian with mean p(xo) and width 
e(#o); in particular, the entropy of p(c\xq) is known an¬ 
alytically. Therefore, we compute I(x,c) by numerically 
performing the integral. We validated our code by com¬ 
puting information content of simple profiles for which 
the information content can also be calculated analyti¬ 
cally. 
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